Method and apparatus for sound enhancement with envelopes of multiband-passed signals feeding comb filters

ABSTRACT

Sound is processed for therein enhancing wanted sound with respect to unwanted sound. The sound is distributed over a plurality of parallel pass bands. In each channel, possibly with excepting the lowest frequency channels, the envelope of the respective signals in that frequency band is detected. Next, the envelope, or in the lowest frequency channels, the signal itself is preferentially filtered for enhancing signals at the fundamental frequency of the wanted sound. Subsequently, as far as applicable, the signal filtered is modulated with the envelope found for the channel in question and all channel outputs are summed.

BACKGROUND OF THE INVENTION

The invention relates to a method for processing source sound fortherein enhancing wanted sound with respect to unwanted sound, saidmethod comprising the steps of:

distributing said source sound over a plurality of bandpass filters inas many channel in parallel;

in each channel applying a respective filter means for preferentiallyfiltering the wanted sound with respect to the unwanted sound in thatchannel's frequency band;

aggregating output signals of said channels to an enhanced output sound.

First, the wanted sound may be speech, or more generally, such sound towhich a particular pitch may be attributed. Sound having no such pitchis left out of consideration as a target for being enhanced. Now, soundenhancing is improving the signal-to-noise ratio, wherein the noise maybe another sound or voice than the one to be enhanced, music, noisesgenerated by identifiable objects such as machines, or just physicallypresent noise, of which the source is unknown or indistinct. Suchenhancing intends to make the wanted sound better comprehensible, moreagreeable or otherwise more suitable. It would be feasible to enhancethe sound of a particular musical instrument with respect to otherinstruments. The result of the enhancing may be used per se. Anotherapplication would be to subtract the enhanced signal from the sourcesignal for subsequently using or further processing of the subtractionresult.

The described straightforward method may succeed for low frequenciesthat are coupled to the pitch of the signal in question, whether wantedor unwanted. Higher harmonics, however, cause problems of variousnature. First, the phase of such higher harmonics is less preciselycoupled to the basic pitch period; in extreme cases, the phase itself issubject to noisy phenomena. Therefore, such methods would attribute tothese latter noisy phenomena a certain harmonic structure. This would,in its turn, cause disturbances in the higher frequency range of thewanted signal, and effectively attenuate higher-frequency componentsthereof. This effectively would render the recited solution imperfectwith respect to the objects recited supra.

SUMMARY OF THE INVENTION

Accordingly, amongst other things it is an object of the invention toprovide a straightforward speech enhancing method that may be easilyadapted to actual needs and allows for a broad field of applications.Now, according to one of its aspects, the method of the invention ischaracterized in that

feeding each bandpass filter's output to an envelope detecting means tofeed that channel's filter means;

feeding each respective filter means' output to an envelope modulatingmeans to generate that channel's output signal.

The philosophy of the present invention is that at higher frequenciesthe phase of the envelope rather than the phase of the signal itself iscoupled to the pitch period. Unwanted signals should therefore befiltered out by adaptively filtering the envelopes of the respectivefrequency bands rather than the signal itself.

Advantageously, said filter means comprise comb filter means. Now,single channel comb filtering on the signal itself has been described inJ. S. Lim et al., Evaluation of an adaptive comb filtering method forenhancing speech degraded by white noise addition, IEEE Transactions onAcoustics, Speech and Signal Processing, Volume ASSP 26 (1978), pages354-358. The present solution is to apply filtering, in particular, butnot limited to comb filtering, in a plurality of parallel channels, asexecuted on the signal envelopes. A slightly different solution is toreplace the comb filtering by harmonical selection. If the wanted signalis stationary, the two methods are mathematically equivalent, and theterm used in the Claim would also cover the later technology. Inparticular, the latter technology relates to a change from the timedomain to the spectral frequency domain. If the wanted signal, however,is non-stationary, the translation to harmonical selection is no longercorrect. For the correctness of the comb-filtering approach properhowever, the wanted signal needs not be stationary. Now, the abovemethods apply because it has been found that encoding a signal andreconstruction thereof by means of the envelopes of the variousfrequency bands will produce a wanted signal practically without audibledistortion. By itself, multirate filtering for subband coding/decodinghas been described in Martin Vetterli, A Theory of Multirate FilterBanks, IEEE Transactions on Acoustics, Speech and Signal Processing,Volume ASSP 35, No. 3, March 1987, pages 356-372.

The invention also relates to an apparatus for speech enhancementcomprising a first plurality of channels assigned to respectivecontiguous frequency bands, said apparatus comprising distributing meansfor distributing said source sound over said channels, each channelcomprising:

bandpass filter means at a frequency of the associated channel;

envelope detecting means fed by the channel's bandpass filter means;

comb filter means fed by the channel's envelope detecting means fed bythe channel's;

envelope modulating means fed by the channel's filter means;

said apparatus furthermore having output means fed by outputs of allchannels in parallel. Such apparatus would find useful application forspeech and music processing, for example for reproduction purposes, bothreal-time and in recording, for information dissemination, education,entertainment, psychology, musically, linguistics, historical studiesand forensic investigation.

Various advantageous aspects are recited in dependent Claims. In all ofthe instances, the enhancement always is a relative one, that may becombined with amplification or attenuation of the wanted signal itself.

BRIEF DESCRIPTION OF THE DRAWINGS

For a fuller understanding of the invention, reference is had to thefollowing description taken in connection with the accompanyingdrawings, in which:

FIGS. 1a-1c represent various signal diagrams that are relevant in theembodiment;

FIGS. 2a-2d represent various response diagrams that are relevant to theembodiment;

FIG. 3 is a block diagram of an apparatus according to the invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

FIG. 1a is an amplitude versus time signal of a speech sample that isexclusively shown by way of example. Time as well as amplitude shouldonly be considered as relative quantities, inasmuch as the invention isdirected to various kinds of signal sources although speech is animportant field of use. However, all kinds of other sounds would applythat have physical sources of more complicated nature than those thatproduce pure harmonics.

FIG. 1b shows the same signal as FIG. 1a, but now transposed to thefrequency domain. The frequency range is 0-5000 Hertz on a linear scale.Amplitude is relative; in this respect the Figure is illustrate, notcalibrative. Curve 1b1 is the logarithm of the spectral amplitude as afunction of frequency f. At lowest frequencies the amplitude isextremely low. At intermediate frequencies, the amplitude is sometimeshigh and sometimes low. Much variation exists, however. At highfrequencies, the amplitude gradually sinks, but not without furthervariation. Curve 1b2 is the spectral envelope of the signal that hadcaused curve 1b1, again as a function of frequency. For better clarity,curve 1b2 has been given some upward shift with respect to curve 1b1.Notably, the variations in curve 1b2 are much smoother than those incurve 1b1. The peaks in the envelope generally correspond to theso-called formant frequencies of speech. For discussion on the formantphenomena, reference is had to standard textbooks on speech analysis.Curves 1b3 represent bandpass filters for each of the five respectiveformant frequencies. Bandwidth is approximately 500 Hertz. The flatparts of the transmission curves represent essentially 100%transmission. In an actual optimum embodiment of the present invention,there would be more of these bandpass filters, so that the full acousticenergy would be transmitted. The passbands also would be narrower and,closer to each other (about just as far as the two passbands associatedto the two highest formant frequencies). In practice, widths of 1/3 ofan octave would be most logical for perceptive reasons. Anyway, theaggregated transmission curve of all passband filters combined shouldnot have holes, but should be essentially flat with respect tofrequency.

FIG. 1c shows five curve pairs, each pair associated to a particular oneof the five formant frequencies of curve 1b2. Of each pair, the lowercurve represents the transmitted amplitude of the signal itself. Theupper curve (shifted vertically somewhat) represents the amplitudeenvelope of the transmitted signal. The upper pair is associated to thebasic pitch of the speech sound in question as passed by an appropriatebandpass filter. Common pitch frequencies for adult male voice are50-200 Hertz, although lower values are not uncommon. Female andjuvenile voices have substantially higher pitches, 150-300 Hertz forfemales, up to 400 for children while soprano pitch may incidentallyrise to 1200 Hertz. Now, as shown, the signal itself is modulated withan almost periodical amplitude. The envelope is periodic with the pitchfrequency. Such pitch variation as exists is slow relative to the pitchperiod. The next pair of curves symbolizes the speech signal of the nexthigher formant frequency with respect to the pitch (roughly the 21/2thharmonic in this example). On the one hand, the phase with respect tothe pitch shows some fluctuation with time, and also, the signal shapeis less sinusoidal than of the first formant. This phenomenon growsstill more clear for the curve pairs associated to the highest frequencyformants. F3, F4, F5: although the gross shape (= related to theenvelope) is rather periodic, this does not apply to the signal itself,which is very non-periodic. At the highest frequency formants even theenvelope gets seriously non-periodic. This means that large phasevariations occur. In consequence, the present invention uses theenvelope of the high frequency bands for further processing. Generally,non-speech signals would lead to similar signal diagrams.

FIG. 2a exemplifies the impulse response of a comb filter. The heightsof the respective peaks add to 1. The output of the filter is theconvolution of the input signal with the transmission coefficients ofthe respective comb teeth. The interval between contiguous teeth is theknown or measured pitch period of the input signal. Therefore, atconstant pitch, the comb is generally symmetric, although thisrequirement is not completely strict. Generally, response coefficientsget lower at a further distance from the centre. The number ofcoefficients has been chosen as an odd value of 7, but other values,inclusive even values, are applicable as well. Generally, the layout ofFIG. 2a is rather arbitrary. The repetition of the comb filter'sapplication is arbitrary, but usually faster than the pitch frequencyitself.

FIG. 2b, at left, shows an infinite pulse train in time (=horizontalaxis). At right, FIG. 2b shows the Fourier-transform thereof: this is aninfinite number of identical pulses drawn only at the right hand side ofthe frequency axis.

FIG. 2c, at left, shows an exemplary window function in time. At right,FIG. 2c shows the Fourier-transform at about the same scale as theFourier-transform in FIG. 2b. The result here is a relatively narrowpeak that is symmetrically around the zero point of the frequency axis.

FIG. 2d, at left, shows the signal that is transmitted when the windowfunction of FIG. 2c operates on the pulse train of FIG. 2b. Likewise, atright, FIG. 2d shows the result of convolving the Fourier-transforms ofthe pulse train in FIG. 2b and of the window in FIG. 2c. The right handside of FIG. 2d now is the Fourier-transform of the left hand side ofFIG. 2d.

Now, FIG. 3 is a block diagram of an apparatus according to theinvention. Therein, input means 20 receive the source sound containingthe wanted sound to be enhanced on which unwanted sound is superposed.The input may represent microphones or similar transducers, a digital oranalog audio transmission channel, or other conventional apparatus.Items 22-30 are a plurality of bandpass filters that have contiguouspassbands so that collectively they pass all acoustic energy within thefrequency range of interest. Such range need not comprise necessarilyall energy on input means 20 and the aggregate transmission coefficientflatness may be chosen according to intended accuracy or other usefulcriterion. The number of filters is arbitrary, but may be, for example,32 or 64. In that case, the half-height width of the response curves maybe, for example 1/10-1/3 of an octave. The filters may operate accordingto digital or analog methods.

Array 32 comprises envelope detecting means, for example realized asdown-sampling means. In practice, this operates as a demodulator.Down-sampling has been given in the Vetterli reference, op cit. Anothereasy procedure is double sided rectifying followed by a smoothingprocedure. The time constant of the smoothing is comparable to thebandwidth of the band in question. Next, the smoothed signal is sampledat a somewhat lower recurrency. In addition to the five channels sodiscussed, there are two exemplary additional channels shown that havebandpass filters 60, 62, but no envelope detectors in array 32. Thelatter channels are applied for the spectrum part where the phase of thesignal is invariant. In practice, this is the low-frequency part, forexample, for speech, everything below 1250 Hertz, depending on the kindof sound that is being processed. In particular, the width of allbandpass filters is equal as measured in octaves.

Array 42 are the respective comb filters that have been discussed withrespect to FIG. 2. Note that all channels have comb filtering, alsothose not provided with envelope detection means. Moreover, all combfilters preferably have uniform structure in that the inter-teethdistance equals actual pitch period and teeth heights have the samepattern. Array 52 in counterparting to array 32 has modulation of thefiltered signal by the respective envelopes detected earlier in array32. The relative interconnection feeding the modulation-controllingsignal from array 32 to array 52 has been suppressed for brevity. Ofcourse, channels that had no envelope detection now also go withoutmodulation-by-envelope. The outputs of all respective channels arecombined onto output 64.

Now, the above discloses FIG. 3 on a functional level. Actualrealization on the level of electronic circuitry has not been shown,such as synchronization, signal definition, electronic realization,etcetera. Such detailing is left to the skilled art technician.

I claim:
 1. A method for processing source sound for therein enhancingwanted sound with respect to unwanted sound, said method comprising thesteps of:distributing said source sound over a plurality of bandpassfilters in as many channels in parallel; in each channel applying arespective filter means for preferentially filtering the wanted soundwith respect to the unwanted sound in that channel's frequency band;aggregating output signals of said channels to an enhanced output sound,characterized by: feeding each bandpass filter's output to an envelopedetecting means to feed that channel's filter means; feeding eachrespective filter means' output to an envelope modulating means togenerate that channel's output signal.
 2. A method as claimed in claim1, wherein said filter means comprise comb filter means.
 3. A method asclaimed in claim 1 wherein said wanted sound is human speech sound.
 4. Amethod as claimed in claim 1, for enhancing a particular musicalinstrument for isolating or subtracting thereof with respect to anyfurther musical instrument.
 5. A source sound processing apparatus foruse in enhancing wanted sound with respect to unwanted sound accordingto a method as claimed in claim 1, said apparatus comprising a firstplurality of channels assigned to respective contiguous frequency bands,said apparatus comprising distributing means for distributing saidsource sound over said channels, each channel comprising:bandpass filtermeans at a frequency of the associated channel; envelope detecting meansfed by the channel's bandpass filter means; comb filter means fed by thechannel's envelope detecting means; envelope modulating means fed by thechannel's filter means; said apparatus furthermore having output meansfed by outputs of all channels in parallel.
 6. An apparatus as claimedin claim 5, and having supplementary channel means at a frequency thatis lower than and contiguous to the frequency band of said firstplurality of channels combined, any supplementary channel in saidsupplementary channel means being fed by said distributing means andcomprising bandpass filter means at a frequency of the associatedsupplementary channel and comb filter means fed by the channel'sbandpass filter means, and also feeding said output means.
 7. Anapparatus as claimed in claim 6, wherein said envelope detecting meanscomprise down-sampling means and said envelope modulating means compriseup-sampling means.
 8. An apparatus as claimed in claim 5, wherein saidcomb filter means have mutually uniform filter characteristics, at aninter-teeth spacing that substantially equals an instantaneousfundamental frequency of said wanted sound.
 9. A method as claimed inclaim 2 wherein said wanted sound is human speech sound.
 10. A method asclaimed in claim 2, for enhancing a particular musical instrument forisolating or subtracting thereof with respect to any further musicalinstrument.
 11. A method as claimed in claim 3, for enhancing aparticular musical instrument for isolating or subtracting thereof withrespect to any further musical instrument.
 12. An apparatus as claimedin claim 6, wherein said comb filter means have mutually uniform filtercharacteristics, at an inter-teeth spacing that substantially equals aninstantaneous fundamental frequency of said wanted sound.
 13. Anapparatus as claimed in claim 7, wherein said comb filter means havemutually uniform filter characteristics, at an inter-teeth spacing thatsubstantially equals an instantaneous fundamental frequency of saidwanted sound.